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Abstract 

A more sums than differences (MSTD) set is a finite subset S of the integers such that 
\S + S\ > \S-S\. We construct a new dense family of MSTD subsets of {0, 1, 2, . . . , n - 1}. 
Our construction gives 0(2™/n) MSTD sets, improving the previous best construction with 
fl(2 n /n 4 ) MSTD sets by Miller, Orosz, and Scheinerman. 
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1 Introduction 

A more sums than differences (MSTD) set is a finite set S of integers with \ S + S\ > \S — S\, where 
the sum set S + S and the difference set S — S are defined as 

S + S = {s\ + s 2 : s\, s 2 € S} 
S - S = {s\ - s 2 : s\, s 2 G S}. 

Since addition is commutative while subtraction is not, two distinct integers s% and S2 generate one 
sum but two differences. This suggests that S + S should "usually" be smaller than S — S. Thus 
we expect MSTD sets to be rare. 

The first example of an MSTD was found by Conway in the 1960's: {0,2,3,4,7,11,12,14}. 
The name MSTD was later given by Nathanson [8]. MSTD sets have recently become a popular 
research topic [D00E1171E1QI1CE7]. For older papers see [3 IS Q31 E2 031 03] • We refer the 
reader to [8] for the history of the problem. 

Let p n -i2 n be the number of MSTD subsets of {0, 1, 2, . . . , n — 1}. We refer to p n informally as 
the density of the family of MSTD sets. This quantity was first studied by Martin and O'Bryant 
[5], who showed that /) n >2x 10 -7 for n > 14. However, this bound is far from optimal. Recently, 
the author [17] showed that p n converges to a limit, and computed a lower bound of 4 x 10~ 4 for 
this limit. From Monte Carlo experiments, we expect limiting density to be about 4.5 x 10 -4 [5]. 

The proofs of the lower bounds on p n are non-constructive. On the other hand, infinite families 
of MSTD sets were constructed by Hegarty PQ, Nathanson [8], and Miller, Orosz, and Scheinerman 
[6]. In particular, Miller et al. gave the densest construction in terms of the number of subsets of 
{0, 1, . . . , n — 1}; their construction has density f2(l/n 4 ). 

In this paper, we offer a new construction of an infinite family of MSTD sets. Our construction, 
described in Section [21 has density B(l/n), improving the previous result of Miller et al. [6j In 
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Section [3] we prove that our family of MSTD sets has the claimed size. In the process we introduce 
a new combinatorial object called bidirectional ballot sequence, whose additional properties are 
discussed in Section 01 



2 Construction of MSTD sets 



We use [a, b] to denote the set {a, a + 1, . . . , b}. In this section we describe our construction of a 
new family of MSTD subsets of [0, n — 1]. 

The first idea used in our construction is similar to the techniques used in both [5] and [§]; 
namely we look for sets of the form 

S = L U M U R, 

where 

L = Sn [0,£- 1], 
M = Sn[£,n-r-l], 
R = S H [n — r, n — 1] . 

We will fix L and R to be sets with certain desirable properties and let M vary. 

S -- 



L 



M 



R 



S + S 

s-s 



L + L 


? 


R + R 


L-R 


? 


R-L 



Figure 1: Illustration of the construction of S. 
For instance, adapting the construction from [5] and taking I = r = 11 and 



L = {0,2,3,7,8,9,10}, 
i? = {n — 11, n — 10, n 



9,n 



6, n — 3, n — 2, n — 1}, 



(1) 
(2) 



we have 



L + L = [0, 20] \ {1}, i? + R = [2n - 22, 2n - 2]. 



On the other hand, S — S 1 is missing at least two differences, namely ±(n — 7), so \S — S\ < In — 3. 
If we can get S + S to contain [21, 2n — 23] (i.e., all the middle sums not yet covered by L + L or 
R + R), then S + 5 is only missing the sum 1, and thus \S + S\ = 2n — 2, thereby making S an 
MSTD set. 

So our goal is to choose M so that S + S is not missing any sums in the middle segment, i.e., 
[21, 2n — 23]. From the probabilistic argument of [5], we know that the set of all M's with this 
property occupies a positive lower density of all subsets of [11, n — 12]. However, that proof is 
non-constructive. 

Note that if M + M is not missing any sums (i.e., M + M = [2 • 11, 2(n - 12)]), then S has the 
desired properties. This condition forces 11, n — 12 E M, so that 21, 2n — 23 G S + S as well. Let 
us temporarily do some re- indexing so that the problem becomes finding subsets M of [1, m] such 
that M + M = [2, 2m]. Note that the probabilistic argument of [5] also shows that the set of such 
M's has at least positive constant density. 

The construction of [6] is as follows: let M contain all k elements on each of its two ends (i.e., 
[1, k] U [m — k + 1, n] C M), and furthermore let M have the property that it does not have a run 
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of more than k consecutive missing elements. Here k is allowed to vary. This construction gives a 
density of Q(l/n ). 

We use a different approach to construct M. The property of M that we seek is the following: 
for every prefix and suffix of [1, m], more than half of the elements are in M. The following lemma 
proves that this constraint is sufficient for our purposes. 

Lemma 2.1. If M C [l,m] satisfies 

k k 
|Mn [l,fc]| > -, and \M n [m - k + 1, m]| > - 

for every < k < m, then M + M = [2, 2m]. 

Proof. Let 2 < x < 2m. If x < m, then since M contains more than half of the elements in [1, x— 1], 
by the pigeonhole principle, there is some y so that y,x — y S M, so that x £ M + M. Similarly, 
if x > m, then since M contains more than half of the elements in [x — m,m], we can find some 
x — y, y £ M so that m € M + M as well. □ 

The construction of this new family of MSTD sets is summarized in the theorem below. 

Theorem 2.2. Let n > 24. Moreover, let M be a subset of [11, n — 12] with the property that every 
prefix and every suffix of the interval [11, n — 12] has more than half of its elements in M. Then 
S = L U M L) R is an MSTD set, where L and R are given in ([TJ and ([2]). The number of MSTD 
sets of {0, 1, . . . , n — 1} in this family is Q(2 n /n). 

To prove the last assertion in the theorem, we need to count the number of sets in our family. 
This is done in the next section. 

3 Bidirectional ballot sequence 

In order to study the sizes of our new families of MSTD sets, we introduce the following combina- 
torial construction. 

Definition 3.1. A 0-1 sequence of length n is a bidirectional ballot sequence if every prefix and 
suffix contains strictly more l's than 0's. The number of bidirectional ballot sequences of length n 
is denoted B n . 

Recall that a classical ballot sequence is a 0-1 sequence where we only require that every prefix 
has more l's than 0's. A bidirectional ballot sequence is then a ballot sequence whose reverse is 
also a ballot sequence. This construction appears to be new. Table Q] gives some values of B n . At 
the time of this writing, the sequence B n was not found on the Sloane On-Line Encyclopedia of 
Integer Sequences [15] . 



Table 1: Number of bidirectional ballot sequences of length n. 



n 


1 


2 


3 4 


5 


6 


7 


8 


9 


10 


11 


12 


Bn 


1 


1 


1 1 


2 


3 


5 


9 


15 


28 


49 


91 


n 


13 


14 


15 16 


17 


18 


19 


20 


21 


22 


23 


24 


B„ 


166 


307 


574 1065 


2016 


3769 


7176 


13532 


25842 


49113 


93995 


179775 



It is easy to see that the possibilities for the set M in the construction in Theorem 12.21 corre- 
spond bijectively with bidirectional ballot sequences of length n — 22. Then, the proof of the final 
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Figure 2: A bidirectional ballot walk corresponding to the sequence 11011011010011111011. The 
middle dashed line divides the walk into two halves. 



assertion in the theorem is equivalent to the following result about the number of bidirectional 
ballot sequences of a given length. 

Proposition 3.2. The number of bidirectional ballot sequences satisfies B n = O (2 n /n). 



This rest of this section contains a proof of Proposition [ 

We can interpret 0-1 sequences in terms of lattice walks, where we start at the origin and take 
steps of the form (1, 1) and (1, —1), corresponding to the terms 1 and in the sequence, respectively. 
Let a ballot walk (resp. bidirectional ballot walk) be such a lattice walk corresponding to a ballot 
sequence (resp. bidirectional ballot sequence). So, a ballot walk is a lattice walk with the property 
that the starting point is the unique lowest point, and a bidirectional ballot walk has the additional 
property that the ending point is the unique highest point. See Figure [2] for an example. 

The key idea in the proof of Proposition 13.21 is to divide a bidirectional ballot walk into two 
halves, as in Figure [2 The second half should be "reversed," i.e., viewed with a 180° rotation. For 
the upper bound, we notice that each half is necessarily a ballot walk. For the lower bound, we 
need some sufficient condition on the two halves so that neither "overshoots" the other when the 
two halves are glued together. 

Let us recall the following classic theorem about ballot sequences (e.g., see [TO]). 

Theorem 3.3 (Ballot Theorem). Let p > q. The number of ballot sequences with p 1's and q 0's, 
or equivalently the number of ballot walks with p steps of the form (1,1) and q steps of the form 
(1, —1), is equal to 

r p + q—l\ fp + q—l\ p-qfp + q^ 



p — 1 J \ p J p + q\ p 

Corollary 3.4. Let < a < b be real numbers. The number of ballot walks with n steps and whose 
final height is inclusively between a and b is 

n — 1 \ / 7i—l 

Proof. We use the Ballot Theorem and sum over all (p, q) with p + q = n and a < 2p — n < b to 
find that the desired quantity is 



We will also use the following well-known fact about the normal approximation of binomial 
coefficients. It can be proved using either Stirling's formula or the Central Limit Theorem. 
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Proposition 3.5. For any real number t, 



Jn ( n \ / 2 i .2 
lim i . _. = A/-e~2* . (3 



n — 1 



3.1 Upper Bound 

Lemma 3.6. The number of ballot walks with n steps is , r _, , 

"\nj2\-\) ^nn 

Proof. This follows directly from Corollary 13.41 and Proposition 13,51 □ 

Let hq = [n/2\ and n\ = \n/2 \ . A bidirectional ballot walk is necessarily a ballot walk of length 
no followed by the reverse of a ballot walk of length n\. Therefore, the number of bidirectional 
ballot walks with n steps is at most 

o^)o(^L)=o( 2 - 



Thus we have proven the following upper bound on B n . 
Proposition 3.7. B n = 0(2 n /n). 



3.2 Lower Bound 

We know that the first half and the reverse of the second half of a bidirectional ballot walk are 
both ballot walks, but this alone is not enough to guarantee that the overall walk is a bidirectional 
ballot walk. So we place additional constraints on each half of the walk. 

Definition 3.8. Let b be a positive integer. A 6-bounded walk is a ballot walk that never goes 
into the region y > 2b and ends in the region y > b. 




Figure 3: An example of a 6-bounded walk. 



Lemma 3.9. The concatenation of a b-bounded walk followed by the reverse of another b-bounded 
walk is necessarily a bidirectional ballot walk. 

Figure H] is a "proof by picture" of the lemma. The 6-boundedness ensures that neither half 
overshoots the other. 

Lemma 3.10. The number of [\/n\ -bounded walks of n steps is il(2 n /y / n). 
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Figure 4: "Proof by picture" of Lemma 13.91 



Proof. We see that 6-bounded walks of n steps are precisely ballot walks that end in the region 
b + 1 < y < 26 and never go into the region y > 26. Using Corollary 13.41 we see that the number 
of ballot walks with n steps that end in b + 1 < y < 2b is equal to 



n — 1 
\h(n + b-l)] 



n — 1 
[n/2\ + 6 



Now we need to consider those ballot walks that end in b < y < 2b but go into y > 2b at some 
point in the walk. Let (t, 26+ 1) be the last point in walk that is in the region y > 26. We can reflect 
the portion of the walk after that point to get a ballot walk that ends in y > 26 + 1. See Figure 
[5] for an illustration. This map is injective since we can always get back to the original walk, but 
it is not necessarily onto. Then, we know that the number of ballot walks that end in 6 < y < 26 
but go into y > 26 at some point is at most the number of ballot walks that end in y > 26 + 2. By 
Corollary 13.41 the number of ballot walks that end in y > 26 + 2 is equal to ( fy/2]+b) ' 




26 + 1 
26 



Figure 5: Reflecting the last segment of a walk. 
Therefore, the number of 6-bounded walks is at least 



n 



1 



\l(n + b-l)] 
Let 6 = Lv^J • Using Proposition 13.51 we have 



n — 1 
[n/2\+b 



n — 1 
\n/2] +bj' 



lim 



n 



1 



n — 1 
[n/2\ + 6 



n — 1 
\n/2] + 6 



n->oo 2™ VUiO + k- 1)1 
It follows that the number of Lv^J-bomicled walks is fL(2 n /T,/n) 



-1/2 



2e" 



> 0. 



□ 
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As before, we can form bidirectional ballot walks by concatenating two 6-bounded walks, where 
the second half is reversed. Let no = \_n/2\ and n\ = \n/2\. Then, the number of bidirectional 
ballot walks is at least 




= Q{2 n /n). 



Thus we have proven the following. 
Proposition 3.11. B n = Q(2 n /n). 

Propositions 13 . 7l and 13 . 1 ll together complete the proof of Proposition [3~T2l and hence also Theorem 

EJ3 



4 Further remarks 

We believe that there is more potential to bidirectional ballot sequences than what it presented 
here. Knowing that B n = 0(2 n /n), we can ask whether the ratio nB n /2 n approaches a limit. 
Table [2] contains some values computed from an exact formula for B n . The data suggest that 
nB n /2 n ~ 2 — > 1. This is indeed true. We have a proof of this fact, but our proof is rather long and 
technical, so we do not present it here. The proof involves first finding an exact formula for B n 
using repeated applications of the reflection principle, and then some analysis to estimate the sum. 
The data in Table [2] also suggest the asymptotic expansion 

En = — + — 

2 n 4n 6n 2 

which we pose as a conjecture. 




Table 2: Some values of nB n /2 



n 


nB n /2 n ~ 2 


100 


1.0067268. . . 


1000 


1.00066729. . . 


10000 


1.0000666729. . . 



Bidirectional ballot sequences look superficially similar to Dyck paths and Catalan numbers. 
However, the former lack the nice enumerative properties enjoyed by the latter two. There does 
not seem to be any simple recursive structure in bidirectional ballot sequences, and we were unable 
to find any useful recurrence relations or generating functions for B n . This is what makes the 
enumeration of bidirectional ballot sequences particularly difficult. 

We can interpret bidirectional ballot sequences in terms of random walks. Suppose we take a 
random walk of n steps in 7L where each step independently moves one unit to the left or the right, 
each with 1/2 probability. Let p n denote the probability that, among all the points visited by the 
walk, the starting point is minimum and the ending point is maximum. Then p n = B, n +<il2 n ~ 1/n 
as n — > oo. 

Were it the case that p n ~ c/n for any other constant c, then perhaps the result might be much 
less interesting. However, as it stands, we feel that p n ~ 1/n is not merely a coincidence, and we 

1 Indeed, if we only require the starting point to be minimum, then it is easy to show that p n ~ J the constants 
here are not nearly as nice. 
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believe that it deserves a better explanation then the calculation-heavy proof that we have. There 
should be some natural, combinatorial explanation, perhaps along the lines of grouping all possible 
walks into orbits of size mostly n under some symmetry, so that almost every orbit contains exactly 
one walk with the desired property. So far, we do not know of any such explanation. 

We are also currently investigating higher dimensional analogues of this type of random walk 
problems. We have some experimental data that suggest the prevalence of the 1/n asymptotics 
for analogous walks in higher dimensions. We currently have no proof or explanation of this 
phenomenon. 

The asymptotics related to bidirectional ballot sequences are very intriguing, and we hope to 
generate more interest in these objects. 
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